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Abstract 

We review the next generation global PDF sets: NNPDF3.0, MMHT14 and CT14. We 
describe the global datasets, particularly the new data from LHC Run 1, recent devel¬ 
opments in QCD theory and PDF methodology, improvements in their combination and 
delivery, and future prospects for parton determination at Run 2. 
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1 Next Generation PDFs 


In order to make the most of the LHC, we need to be able to compute standard model 
cross-sections reliably and precisely. These days a wide variety of inclusive hard processes 
are known to NLO and increasingly NNLO in perturbative QCD. However to obtain 
a physical cross-section, these must be folded with nonperturbative parton distribution 
functions (PDFs). Since the PDFs cannot be computed from first principles, they must 
be determined empirically. This is a nontrivial task: the PDFs g, u, u, d, d, s, s, ..., are 
functions of x and Q 1 2 , correlated through both theoretical constraints and measurements 
from a wide variety of different experiments and processes. Uncertainties in PDFs remain 
one of the dominant sources of uncertainty for many important LHC cross-sections. Re¬ 
cently, the major PDF collaborations have all been using data from LHC Run I to further 
constrain PDFs in preparation for Run II. 

There are at present three PDF fitting collaborations providing global PDF determina¬ 
tions. Their most recent sets are NNPDF3.0 |l] (which supercedes the NNPDF2.X sets 0). 
MMHT14 |2j (which now replaces the long serving MSTW08 set U), and CT14 i (which 
supercedes the CT10 sets 0 ). All three combine a wide range of older DIS, neutrino and 
Drell-Yan fixed target data with HERA DIS data, Tevatron Drell-Yan, W/Z and jet data, 
and now also Drell-Yan, W/Z and jet data from LHC Run I. These data span a kinematic 
range of more than four orders of magnitude in x and six orders of magnitude in Q 2 , and 
the wide range of different processes are together sufficient to extract all PDF combina¬ 
tions without theoretical assumptions beyond those embodied in fixed order perturbative 
QCD. By contrast the ABM sets [7j are based only on DIS and Drell-Yan data, with no 
data from the Tevatron, and have difficulties extrapolating up to LHC energies, while the 
HERA PDFs 18,91 use only HERA data, and consequently have larger uncertainties than 
the global sets 1101. In this short review we thus only consider in detail the three most 
recent global sets. 

2 Global Datasets 

A detailed comparison of the datasets used in each of the three most recent global fits 
is presented in TabflJ together with the total number of datapoints used. The most 
striking feature of the table is that while the three collaborations have different detailed 
preferences, the global datasets are broadly similar in scope and coverage. Thus while 
CT14 does not use the recent CHORUS I/-DIS data, it retains the older CDHSW and 
CCFR data. While all three collaborations now use the combined HERA-I data, only 
NNPDF3.0 also uses HERA-II dataQ NNPDF prefer not to use DO jet data, which were 
analysed with the midpoint algorithm which is infrared unsafe and thus cannot be used 
with NNLO calculations: all three collaborations now use a significant amount of LHC 
Run I data, though CT14 has yet to include the CMS double differential Drell-Yan data 
or the tt total cross-section. And so on. 

It is expected that over the next few years many more LHC datasets will be added to 
this list, some of them improvements on existing measurements, others more novel [ll . 

1 The combined HERA-II data have only been published very recently [9], and will no doubt be in¬ 

corporated in due time. Preliminary analyses by MMHT and NNPDF suggest that their impact will be 
small. 
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NNPDF3.0 

MMHT14 

CT14(prel) 

SLAC p,d DIS 

✓ 

✓ 

X 

BCDMS p,d DIS 

✓ 

✓ 

✓ 

NMC p,d DIS 

✓ 

✓ 

✓ 

E665 p.d DIS 

X 

✓ 

X 

CDHSW nu-DIS 

X 

X 

✓ 

CCFR nu-DIS 

X 

✓ 

✓ 

CHORUS nu-DIS 

✓ 

✓ 

X 

CCFR dimuon 

X 

✓ 

✓ 

NuTeV dimuon 

✓ 

✓ 

✓ 

HERA I NC,CC 

✓ 

✓ 

✓ 

HERA I charm 

✓ 

✓ 

✓ 

HI,ZEUS jets 

X 

✓ 

X 

HI HERA II 

✓ 

X 

X 

ZEUS HERA II 

✓ 

X 

X 

E605 & E866 FT DY 

✓ 

✓ 

✓ 

CDF & DO W asym 

X 

✓ 

✓ 

DO Run II W asym 

X 

X 

✓ 

CDF & DO Z rap 

✓ 

✓ 

✓ 

CDF Run-II jets 

✓ 

✓ 

✓ 

DO Run-II jets 

X 

✓ 

✓ 

ATLAS high-mass DY 

✓ 

✓ 

✓ 

CMS 2D DY 

✓ 

✓ 

X 

ATLAS W,Z rap 

✓ 

✓ 

✓ 

ATLAS W pT 

✓ 

✓ 

X 

CMS W asy 

✓ 

✓ 

✓ 

CMS W+c 

✓ 

X 

X 

LHCb W,Z rap 

✓ 

✓ 

✓ 

ATLAS jets 

✓ 

✓ 

✓ 

CMS jets 

✓ 

✓ 

✓ 

ttbar tot xsec 

✓ 

✓ 

X 

Total NLO 

4276 

2996 

2947 

Total NNLO 

4078 

2663 

2947 


Table 1: Data included in the latest NLO and NNLO global PDF sets, and the total 
number of data points in each fit. 


Light flavour separation will be improved by differential high and low mass Drell-Yan, 
and more accurate W/Z asymmetries and rapidity distributions, while better W+c data 
will pin down strangeness, and Z+c and Z+b will assist the direct determination of heavy 
quark distributions. The gluon at medium and large x will be further constrained by 
differential top production, inclusive jets and dijets, prompt photons, and W/Z + jets. 

All three collaborations producing global fits now make full use of experimental sys- 
tematics when implementing new datasets. These systematics can be either additive or 
multiplicative: multiplicative systematic uncertainties need careful treatment in order to 
avoid the well known d’Agostini bias 
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3 Theory and Methodology 

3.1 Theory 

Each of the three collaborations now produces families of fits at LO (for Monte Carlos), 
NLO and NNLO in perturbative QCD. All now fit the strangeness distribution s + s, and 
NNPDF and MMHT also attempt to fit the strange valence s — s. None of the currently 
available sets include fitted charm distributions, though there have been recent studies 
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NNPDF3.0 

MMHT14 

CT14 

No. of fitted PDFs 

7 

7 

6 

Parametrization 

x a (l — x) b x neural nets 

x a (l — x) b x Chebyshev 

x a (l — x) b x Bernstein 

Free parameters 

259 

37 

28 

Uncertainties 

MC Replicas 

Hessian 

Hessian 

Tolerance 

None 

Dynamical 

Dynamical 

Closure test 

✓ 

X 

X 

Reweighting 

replicas 

eigenvectors 

eigenvectors 


Table 2: Main methodological features of various global PDF sets. 


by CTEQ (13 . All three collaborations use a GM-VFNS for heavy quark distributions 
(FONLL for NNPDF3.0, TR' for MMHT14 and S-ACOT for CT14, differing only by 
subleading terms |14j): this is essential for accurate extrapolation to the high scales of 
many LHC measurements 10,15 . The PDF sets are each determined using a s (mz ) = 


0.118, but provide other sets with a s either side of this value (at intervals of 0.001) for 

They also have their own preferred values of a s (at 

and 0.115tooo4 ( 3 ]). There is 
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0.1172 ±0.0013 
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determination of a s uncertainties. 

NNLO these are 0.1173 ± 0.0007 
as yet no consensus on the input values of m c and mj, or on whether to use MS or pole 
masses. 

An important limitation on the usefulness of hadronic data in constraining PDFs 
is the availability of NNLO corrections. The recent calculation of the tt total cross- 
section to NNLO [l8j has had a significant impact on the determination of the gluon 
distribution, which is expected to improve further once more differential results become 
available. Calculations of the inclusive jet cross-section to NNLO are now available in the 
gg and qq channels 19 , and the full result is eagerly awaited. 

Computationally, new interface tools such as FastNLO 1201 and APPLGRID 1211 have 


been developed to evaluate hadronic cross-sections sufficiently fast to be usable in PDF 
fits. These work by precomputing hard cross-sections in lookup tables. Other tools re¬ 
leased recently include a PDF plotting tool APFEL 1221 and a general purpose fitting tool 
HERAfitter 23 . The impact of new datasets on PDFs may be estimated using Bayesian 
reweighting 24 or PDF profiling 1111 as implemented in HERAfitter. 


3.2 Methodology 

Considerable progress has been made over the years in the methodology used to determine 
PDFs and their uncertainties (see Tabj2|. The Hessian method adopts a fixed parametriza- 
tion, with uncertainties determined through diagonalization of the Hessian matrix. As data 
become more precise, the parametrization must be more flexible, and MMHT14 and CT14 
have recently introduced Chebyshev and Bernstein polynomials into their parametrizations 
for this purpose. If Ay 2 = 1 is used to determine uncertainties in this method, PDF errors 
turn out to be unrealistically small: consequently both collaborations use a tolerance cri¬ 
terion, in which uncertainties are inflated dynamically for each eigenvector in turn in order 
to maintain errors consistent with those of the data. There has been much speculation 
as to whether tolerance is required because of defects of the methodology (in particular 
the limitations of a fixed parametrization), or whether it is due to data inconsistencies 
or defects of the theoretical tools (in particular fixed order perturbative QCD) used to 
describe it 126 i. 


4 





















Distribution of y} for experiments 


Closure test x z 
MSTW2008nlo x 2 
Closure test central x 2 
MSTW2008nlo central x 2 



** 00 4/i^OMs ^C fi rop 


Experiments 



xgtx.Q 2 ) 


Level 2 Reweighted 
Level 2 Fitted 


Figure 1: Some results from a closure test: % 2 values for different datasets (upper) and a 
reweighting test of the gluon distribution (lower) 11 . 


The NNPDF collaboration uses instead a Monte Carlo method 25 in which fits are 


made to data replicas using a very redundant parametrization (for which NNPDF use a 
neural network). These fits give an ensemble of PDF replicas, each of which is equally 
probable, and may thus be used to determine central values, uncertainties, correlations, 
etc. There is no assumption in this method that the PDF uncertainties are Gaussian. 
Moreover since there is no Ay; 2 criterion, there is no need for tolerance. The redundancy 
of the parametrization ensures freedom from parametrization bias. 

The NNPDF methodology has recently been subjected to a closure test 1 . The idea 
behind this is that if both the data and the theory used to describe them were ‘perfect’, 
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Figure 2: The gg, qq and qq luminosities (top to bottom) at the LHC with centre of mass 
energy 13 TeV, as predicted by the three global PDF sets NNPDF3.0, MMHT14, CT14, 
normalised to NNPDF3.0 
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and thus free from any inconsistencies, a fit to these data should also be perfect: any 
defects in the PDFs would be due entirely to imperfections in the methodology. So in 
a closure test we assume a given theory (eg NLO QCD), a given prior PDF set /o (eg 
MSTW08), and then generate a set of N pseudodata by Monte Carlo, using the assumed 
theory, /o, and the statistical and systematic uncertainties from a typical global dataset 
(to ensure the test is as realistic as possible). These perfect pseudodata, together with 
their uncertainties, are then fitted, to yield a fitted PDF set /: if the fitting methodology 
were perfect, we would then find that x 2 = J V, and / = /o, within the PDF uncertainties 
determined in the fit. 

Results from a typical closure test are shown in FigjTJ current NNPDF methodology 
passes the closure test, in the sense that methodological uncertainties have been demon¬ 
strated to be considerably smaller than data and theory uncertainties. This means that 
uncertainties in NNPDF fits are true statistical uncertainties: the NNPDF probability 
distributions are a genuine consequence of the prior data and theory that goes into the fit. 
It would be interesting to also subject the Hessian method to a closure test: in this way 
it may be possible to understand better the reason for the need for dynamical tolerance, 
and whether there is any residual bias in central values due to the fixed parametrization. 

4 Results 

For descriptions and plots of the latest global PDFs, and the quality of their description 
of the various datasets, we refer the reader to the original publications [Fj3|. Here we 
discuss two subjects of particular interest: the predictions for parton luminosities at 13 
TeV, the strangeness fraction, and recent progress in combination and delivery. 

4.1 Luminosities 

Predictions for the gg, qq and qq luminosities at the LHC with centre of mass energy 13 
TeV are shown in Figj2] In the central region all three collaborations now make consistent 
predictions, with similar uncertainties: this is particularly noticeable in the gg channel, 
of direct relevance to Higgs production through gluon fusion, and top production. The qq 
and qq luminosities are also in broad agreement in the central region, but at high scales 
NNPDF lies above the others, with a substantially larger uncertainty. This is because 
PDFs at large x are largely unconstrained by data, but must be bounded below by the 
positivity of any physical cross-section: the uncertainties are thus asymmetrical, and liable 
to be underestimated by Hessian treatments. Constraints on luminosities at high invariant 
mass are important for putting bounds on new physics, and deserve more careful study 27 . 

4.2 Strangeness 

There has been some controversy recently about the strangeness fraction r s (x,Q 2 ) with 
results from ATLAS W+c data apparently suggesting r s = 1, albeit with large uncertain¬ 
ties. If confirmed this would overturn conventional wisdom that strangeness should be 
suppressed due to the strange quark mass. However CMS do see a suppression at large x , 
and this is supported by a recent analysis of neutrino data |28) (see Fig(3]). All the global 
PDF determinations see strangeness suppression, and a detailed study in the context of 
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(i 2 =1.9 GeV 2 , 



NNLO, a s = 0.118, Q 2 = 2 GeV 2 



Figure 3: The strangeness fraction r s : results by ATLAS, CMS and neutrino experiments 
NuTev, NOMAD and CHORUS (above) and from an NNPDF study in the context of a 
global fit (below). Note that the definitions of r s used in the two plots are slightly different: 
in the upper plot r s = (s + s)/2d, while in the lower plot r s = (s + s)/(u + d) 

the global fits shows that there is little or no tension between the neutrino data and W+c 
data. It will be interesting to see how this situation develops when we have more precise 
W+c data from Run 2. 
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4.3 Combination and Delivery 


For many years now the PDF4LHC recommendation for combining predictions obtained 
with different PDF sets was to compute with each of the three global sets |4-jb], and take 
the envelope of the resulting predictions |j29]. This is a conservative method, appropriate 
for the older PDF sets which displayed some inconsistencies, most noticeably for Higgs 
production. It was also time consuming. 

Since the latest global PDFs are much more consistent between each other, it now be¬ 
comes possible to combine them statistically into a single PDF set (to be called PDF4LHC15), 
which becomes the basis for a new recommendation. The combination is done by gener¬ 
ating 300 replicas for each PDF set (the replicas for the Hessian sets being produced by a 
code developed by Thorne and Watt |30|), to give a set of 900 replicas: the prior assump¬ 
tion in the combination is thus that the global PDFs are not statistically independent, but 
that each global set is equally probable. The combination is performed at a fixed value 
of a s = 0.118: the a s uncertainty is treated independently of the PDF uncertainty, and 
added in quadrature. The results of the old and new procedures for the Higgs gluon fusion 
NNLO cross-section are shown in Figj4j 

Since delivery of the full set of 900 replicas is impractical, a number of techniques 
have been developed to make the combined set more manageable. A replica compression 
technique, which preserves the non-Gaussian features of the underlying probability distri¬ 
bution, reduces the set of 900 to 100 or less, with little loss of accuracy 


31 . However for 


many purposes a Hessian representation is preferred, particularly when PDF uncertainties 
are to be treated as nuisance parameters. To turn replicas into Hessian two approaches 
have been proposed. The Meta-PDF approach refits to a functional form at a particular 
scale, which is then evolved in the usual way 1321. The MC2hessian approach instead uses 


the replicas themselves as a basis set, optimised using a genetic algorithm 33 : in this way 
no evolution is required since each replica itself contains its own evolution. It is expected 
that the PDF4LHC15 set will be delivered in three representations: 


• a small Hessian set with only 30 eigenvectors (for applications where high precision 
is not required, such as acceptances, efficiencies or extrapolations); 

• a larger Hessian set with 100 eigenvectors (for PDF uncertainties in precision calcu¬ 
lations); and 

• a Monte Carlo set of 100 replicas (for applications where non-Gaussianity may be 
important, for example searches). 


There will be additional eigenvectors and replicas to allow for a s variations, the results to 
be added in quadrature. 


5 Future Prospects 

The determination of global PDFs has made significant advances in recent years: in the in¬ 
clusion of new and better data (in particularly from LHC), in theoretical advances (driven 
particularly by new NNLO calculations, and new computational tools), in methodological 
developments (more flexible parametrizations, closure testing, reweighting and profiling), 
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Figure 4: The Higgs gluon fusion cross-section, computed using the old envelope method 
(upper) and the new combination method (lower). 


and in presentational improvements (combination and compression). No doubt many of 
these lines of development will continue, stimulated by improved data from Run 2. 


5.1 Variations 


Meanwhile, alongside the mainstream work, there are various side projects aimed at broad¬ 
ening the scope and applicability of PDF determination. Electroweak corrections can make 
substantial contributions to a number of important hadronic processes, particularly W/Z 
production and top production. However a consistent calculation of these effects require 
PDFs with QED corrections, in particular with a fitted photon PDF. A first global deter¬ 
mination of the photon PDF and its uncertainties, using LO QED and NNLO QCD, was 
performed recently 34 , but uncertainties are still large. The situation may improve in 
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the future following a more detailed study of processes such as W pair production which 
may further constrain the photon PDF. 

Fixed order perturbative QCD becomes increasingly unreliable at large x and small x 
due to unresummed logarithms. Evidence for the effect of small x (high energy) logarithms 
has been reported by the HERA collaboration 1819|, but as yet there are no global fits which 
include the effects of small x resummation. However a global fit which resums large x (or 
threshold) logarithms was performed recently |35| , and will have implications for searches 
for new physics since the resummation significantly reduces the quark luminosities at high 
invariant mass. Uncertainties are still large, however. 

All three global PDF collaborations attempt to exclude higher twist and (to some 
extent) nuclear effects by cutting fixed target data at low Q 2 and W 2 . These cuts are 
generally effective 15 . Various attempts have been made to model higher twist and 
nuclear effects [2,27,36 , one of the aims being to improve the accuracy of PDFs in the 


large x region by a controlled relaxation of the W 2 cut. An alternative strategy would be 
to eliminate the use of fixed target data altogether, but the uncertainties on collider-only 
fits 1,4,8,9j are still too great for them to be competitive with the global fits. 

A first global determination of spin dependent PDFs and their uncertainties was also 
performed recently |37|, supplementing polarized DIS data with polarized inclusive jet 
and W production data from RHIC. While there is some evidence for a polarized gluon 
distribution at large x, first moments remain elusive due to the limited small x reach of 
the data. 


5.2 Theory Uncertainties 

The global datasets provided by Run 2 will improve both in precision and kinematic range 
on previous data. Methodological uncertainties in PDF fitting have been shown to be 
under control, thanks to the closure test. Thus increasingly the uncertainty for which we 
really have no reliable estimate is the theoretical uncertainty. 

There are two categories of theoretical uncertainty. The first are the parametric un¬ 
certainties: uncertainties due to the assumed values of a s , m c , mb, mt, CKM parameters, 
9\y, etc. Of these by far the most important is a s , and for this we can do no better 
than take the advice of the PDG. The same holds true for electroweak parameters. More 
controversial are the quark masses, particularly the charm mass. Attempts to determine 


the charm mass from the global fit itself 38 are complicated by the low scale and related 


issues of higher twists and intrinsic charm. 

The second category of theoretical uncertainty is that of missing higher order correc¬ 
tions. Traditionally when computing a specific cross-section these are estimated by scale 
variation. This method has well known failings, and is heuristic at best, but in the context 
of a global fit one is also faced with the issue of correlations: should the scale variations 
in all processes be independent, or should renormalization scales by varied together, and 
only factorization scales varied independently? Moreover, should factorization scales for 
particular types of process, for example DIS, or Drell-Yan, or jets, be treated as correlated? 

Alternative methods of estimating higher order corrections using Bayesian methods 
have been developed recently 39 , and may be applicable to the estimation of theoretical 
uncertainties in PDFs. Meanwhile we can compare NLO and NNLO fits in order to 
estimate theoretical uncertainties: this seems to indicate that in a NLO fit the uncertainty 
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NNPDF3.0, a s = 0.118, Q 2 = 10 4 GeV 2 



NNPDF3.0, a s = 0.118, Q 2 = 10 4 GeV 2 



Figure 5: An estimate of the theory uncertainty due to higher order corrections in the 
NLO gluon, obtained by comparing the result of NLO and NNLO fits. 

due to missing NNLO corrections is roughly the same size as the uncertainty from the 
experimental data (see Figj5|, while in a NNLO fit the theoretical uncertainty is much 
smaller. However this could only be confirmed by performing an approximate N 3 LO 
fit, perhaps based on estimates of N 3 LO evolution and coefficient functions based on 
resummation. 
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